CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci

نویسندگان

  • Omer S. Alkhnbashi
  • Fabrizio Costa
  • Shiraz A. Shah
  • Roger A. Garrett
  • Sita J. Saunders
  • Rolf Backofen
چکیده

MOTIVATION The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. RESULTS We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. AVAILABILITY CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modulation of CRISPR locus transcription by the repeat-binding protein Cbp1 in Sulfolobus

CRISPR loci are essential components of the adaptive immune system of archaea and bacteria. They consist of long arrays of repeats separated by DNA spacers encoding guide RNAs (crRNA), which target foreign genetic elements. Cbp1 (CRISPR DNA repeat binding protein) binds specifically to the multiple direct repeats of CRISPR loci of members of the acidothermophilic, crenarchaeal order Sulfolobale...

متن کامل

Accurate computational prediction of the transcribed strand of CRISPR non-coding RNAs

MOTIVATION CRISPR RNAs (crRNAs) are a type of small non-coding RNA that form a key part of an acquired immune system in prokaryotes. Specific prediction methods find crRNA-encoding loci in nearly half of sequenced bacterial, and three quarters of archaeal, species. These Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays consist of repeat elements alternating with specifi...

متن کامل

CRISPR-Cas: the effective immune systems in the prokaryotes

Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separa...

متن کامل

Characterization of CRISPR RNA transcription by exploiting stranded metatranscriptomic data.

CRISPR-Cas systems are bacterial adaptive immune systems, each typically composed of a locus of cas genes and a CRISPR array of spacers flanked by repeats. Processed transcripts of CRISPR arrays (crRNAs) play important roles in the interference process mediated by these systems, guiding targeted immunity. Here we developed computational approaches that allow us to characterize the expression of...

متن کامل

Biogenesis pathways of RNA guides in archaeal and bacterial CRISPR-Cas adaptive immunity.

CRISPR-Cas is an RNA-mediated adaptive immune system that defends bacteria and archaea against mobile genetic elements. Short mature CRISPR RNAs (crRNAs) are key elements in the interference step of the immune pathway. A CRISPR array composed of a series of repeats interspaced by spacer sequences acquired from invading mobile genomes is transcribed as a precursor crRNA (pre-crRNA) molecule. Thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014